Rate Limiting
Learn about API traffic management and important considerations about rate limits.
Introduction to the rate limiter#
When an API becomes available to the public, there can be an influx of users and services. Anyone can use it at any time and as much as they want, potentially preventing other legitimate users. Moreover, the API providers have limited resources per unit of time provisioned, and they want their services to be fairly available to all of their customers. It’s good to have our APIs used by many people, but untethered access has drawbacks. Too many requests can overwhelm API gateways; therefore, API owners enforce a limit on the number of requests or the amount of clients' data from clients. This constraint on the number of requests or usage is carried out via a component called the API rate limiter.
The API rate limiter throttles clients' requests that exceed the predefined limit in a unit time instead of disconnecting them. Throttling refers to controlling the flow by discarding some of the requests. It can also be considered a security feature to prevent bot and DoS attacks that can overwhelm a server by sending a burst of requests. Overall, rate limiting provides a protective layer when a large number of requests per unit of time (a spike or thundering herds) is directed via an API.
Point to Ponder
Question
These days, businesses earn money when more people use their APIs (actually, businesses earn something per API call). Aren’t we making a business lose money by using a rate limiter to limit the number of requests?
The purpose of the rate limiter is not to throttle requests without reason, but to average out the spikes to avoid possible service overload and keep the traffic within provisioned computational resources. Due to the rate limiter, every client is given a fair share of a service quota.
Responsibilities of a rate limiter#
In addition to a security feature, a rate limiter should also provide the following capabilities to manage API traffic:
Consumption quota: Specifies the maximum number of API calls to the backend that a certain application is permitted to make in a particular time frame. API calls that go beyond the allotted quota may be throttled.
Spike arrest: Recognizes and prevents an unexpected rise in API traffic. It also protects the back-end system and reduces performance lags and downtime.
Usage throttling: This slows down an enormous number of API calls from a specific user within a specified time frame. Usage throttling improves the overall performance, leading to reduced impacts—especially during peak hours.
Traffic prioritizations: Prioritizes incoming requests depending on their criticality, ensuring a balanced and stable API service. For example, an API can be used by both freemium and premium users. Both types of users should have different amounts of API usage quotas.
Working of a rate limiter#
Whenever a rate limiter receives an API request, it looks for the client's information in the database or cache, including the number of requests allowed to the users. If the count of requests already made is less than the maximum number of requests allowed, the current request is forwarded to the servers, and the count is incremented. For example, as shown in the following illustration, a client with “ID: 101” is making a request for a service. The rate limiter allows this request because the “Count” for the incoming request is “3,” which is less than the maximum number of allowed requests per unit of time (“5”).
1 of 6
2 of 6
3 of 6
4 of 6
5 of 6
6 of 6
Point to Ponder
Question
If each request goes out of the rate limiter, then that means the rate limiter is on the critical path and can add latency to the user. How can we manage this problem?
For a rate limiter to avoid causing latency, we should design it in a way to utilize a high-speed cache and do some work offline (not on the client’s critical path). For more details, see Educative’s course on system design.
Where to place a rate limiter#
A rate limiter can be placed in three different locations:
On the client side: The simplest way is to put a rate limiter on the client-side. However, this way, the rate limiter becomes vulnerable to malicious activity, and the service provider configuration can't be easily applied. Additionally, clients may intervene in the rate-limiting value and send as many requests as they want, eradicating its actual role.
On the server-side: A rate limiter can be placed on the server-side within the API server. This way, the same server that provides API services can also handle rate-limiting complexities.
As a middleware: Another approach to place a rate limiter is to isolate it from front-end and back-end servers and place it as a middleware. This way, the rate-limiting services are isolated from the rest of the activities taking place in the system.
From the discussion above, we can conclude that placing a rate limiter on either the server-side or the middleware is suitable. Moving forward, our discussion will be applicable to any of these two approaches because we’ll limit our discussion to the rate limiter itself rather than where to place it.
Characteristics of rate-limiting#
When an API becomes popular and the API owners see a surge of traffic that could affect the availability of the application, they may start exploring rate-limiting mechanisms. The primary goal of the rate limiter is to protect the infrastructure and product. While implementing a rate-limiting system, we should provide the following characteristics to make consumers' lives easier:
Return the appropriate HTTP status code: Whenever a request is throttled, the system should return an HTTP
429status code showing that the number of incoming requests has reached a predefined limit for a given amount of time. It’s also standard practice to let the developers know when they can try the request again by setting theretry-afterheader. Developers can use this header to retry the request.X-RateLimit-Resource: This header indicates the resource name for which the rate limit status is returned.Rate-limit custom response HTTP headers: Apart from the status code, we should include some other custom response headers about the rate limit. These headers help developers when they need to retry their requests. These headers include the following:
X-RateLimit-Limit:This header represents the maximum rate limit for calling a particular endpoint within a specific amount of time.X-RateLimit-Remaining:This header represents the number of remaining requests available to developers in the current time frame.X-RateLimit-Reset:This header indicates the time at which the time window is reset in UTC epoch seconds.
We can execute the following command in the terminal below to show the above headers and their values.
The curl command is used to transmit data to and from a server. Here, we’re trying to access the GitHub server in the command using our own educative repository. In the provided command, -I is used to exclude the response body and to show the document information only. Try it in the terminal below:
As shown below, the output X-RateLimit-Limit shows the number of requests a user can make to the server. The X-RateLimit-Remaining is the number of requests that we can make in an hour. This value is decremented each time we execute the given command (in other words, when we access the server). This quota is reset after X-RateLimit-Reset, which represents a timestamp in the future. Additionally, the value of X-RateLimit-Resource is core. This represents the core GitHub resource.
Rate-limit status API: If there are different APIs for different endpoints, there should be a way for clients to query the limit status of various API endpoints.
Documenting rate limits: Documenting rate-limit values can help users make the right architectural choices. This will allow them to learn about the APIs before getting trapped in the rate-limit errors.
Rate-limiting best practices#
Appropriate rate-limiting algorithms: One good practice is that the server should pick rate-limiting algorithms based on the traffic pattern that it supports. There are various rate-limiting algorithms that we can choose, for example:
Token bucket
Leaking bucket
Fixed window counter
Sliding window log
Sliding window counter
Rate limiting threshold: The rate-limit threshold should be chosen carefully depending on the number of users of an API and the average behavior of a typical customer. For example, for social media applications, the number of users can be several million. So, the rate limit value should be kept relatively higher. These thresholds might need to update over time as workload patterns change.
Documentation for developers: We should offer clear guidance to developers about the rate-limit thresholds and how they can request additional quotas so that they can continue their work without any hurdles.
Smooth rate-limit thresholds: Starting with lower rate-limit quotas is helpful because they can be increased gradually instead of initially allowing a large amount of quotas and then decreasing it later due to a large number of users. Reducing rate-limit values later might negatively impact the client as they become used to a certain number of requests in a certain period.
Exponential back-off mechanism: Simultaneous requests in a time window can overburden a server, which may result in failures of requests. Therefore, to avoid the rejection of incoming requests, we can progressively increase the wait time between retries for consecutive error responses. One of the techniques used to add delays is called an exponential backoff. So, another good practice is to implement exponential backoffs in our client software development kits (SDKs) and provide sample code to developers on how to do that.
Point to Ponder
Question
Exponential backoffs can cause an increase in latency. Why should we still use them?
At the individual level, it might cause latency, but this approach is collectively better for everyone because it decreases the request failure rate.
Summary#
Rate limiting helps organizations stay in control so that their service is used in a predictable way. A rate limiter not only provides a protection layer against malicious actors but also aids in traffic management. They have a kill switch to control any runaway or compromised accounts. In addition, rate limiters enable API developers to throttle the services provided to third-party applications. Depending on the application requirement, various rate-limiting algorithms (that we mentioned at the end of the lesson) can be used.
Quiz#
Quiz on API rate limiters
Which option is not the responsibility of the rate limiter?
Spike Arrest
Traffic prioritization
Traffic redirection
Spike Arrest and traffic prioritization are the responsibilities of the rate limiter, whereas traffic redirection isn’t.
All of the above.
Evolving an API Design
Client-Adapting APIs